Search CORE

44 research outputs found

Text2Action: Generative Adversarial Synthesis from Language to Action

Author: Ahn Hyemin
Choi Yunho
Ha Timothy
Oh Songhwai
Yoo Hwiyeon
Publication venue
Publication date: 24/10/2017
Field of study

In this paper, we propose a generative model which learns the relationship between language and human action in order to generate a human action sequence given a sentence describing human behavior. The proposed generative model is a generative adversarial network (GAN), which is based on the sequence to sequence (SEQ2SEQ) model. Using the proposed generative network, we can synthesize various actions for a robot or a virtual agent using a text encoder recurrent neural network (RNN) and an action decoder RNN. The proposed generative network is trained from 29,770 pairs of actions and sentence annotations extracted from MSR-Video-to-Text (MSR-VTT), a large-scale video dataset. We demonstrate that the network can generate human-like actions which can be transferred to a Baxter robot, such that the robot performs an action based on a provided sentence. Results show that the proposed generative network correctly models the relationship between language and action and can generate a diverse set of actions from the same sentence.Comment: 8 pages, 10 figure

arXiv.org e-Print Archive

Crossref

ScholarWorks@UNIST

Interactive Text2Pickup Network for Natural Language based Human-Robot Collaboration

Author: Ahn Hyemin
Cha Geonho
Choi Sungjoon
Kim Nuri
Oh Songhwai
Publication venue
Publication date: 28/05/2018
Field of study

In this paper, we propose the Interactive Text2Pickup (IT2P) network for human-robot collaboration which enables an effective interaction with a human user despite the ambiguity in user's commands. We focus on the task where a robot is expected to pick up an object instructed by a human, and to interact with the human when the given instruction is vague. The proposed network understands the command from the human user and estimates the position of the desired object first. To handle the inherent ambiguity in human language commands, a suitable question which can resolve the ambiguity is generated. The user's answer to the question is combined with the initial command and given back to the network, resulting in more accurate estimation. The experiment results show that given unambiguous commands, the proposed method can estimate the position of the requested object with an accuracy of 98.49% based on our test dataset. Given ambiguous language commands, we show that the accuracy of the pick up task increases by 1.94 times after incorporating the information obtained from the interaction.Comment: 8 pages, 9 figure

arXiv.org e-Print Archive

ScholarWorks@UNIST

A Unified Masked Autoencoder with Patchified Skeletons for Motion Synthesis

Author: Ahn Hyemin
Lee Dongheui
Mascaro Esteve Valls
Publication venue
Publication date: 14/08/2023
Field of study

The synthesis of human motion has traditionally been addressed through task-dependent models that focus on specific challenges, such as predicting future motions or filling in intermediate poses conditioned on known key-poses. In this paper, we present a novel task-independent model called UNIMASK-M, which can effectively address these challenges using a unified architecture. Our model obtains comparable or better performance than the state-of-the-art in each field. Inspired by Vision Transformers (ViTs), our UNIMASK-M model decomposes a human pose into body parts to leverage the spatio-temporal relationships existing in human motion. Moreover, we reformulate various pose-conditioned motion synthesis tasks as a reconstruction problem with different masking patterns given as input. By explicitly informing our model about the masked joints, our UNIMASK-M becomes more robust to occlusions. Experimental results show that our model successfully forecasts human motion on the Human3.6M dataset. Moreover, it achieves state-of-the-art results in motion inbetweening on the LaFAN1 dataset, particularly in long transition periods. More information can be found on the project website https://sites.google.com/view/estevevallsmascaro/publications/unimask-m

arXiv.org e-Print Archive

Generative Autoregressive Networks for 3D Dancing Move Synthesis from Music

Author: Ahn Hyemin
Kim Jaehun
Kim Kihyun
Oh Songhwai
Publication venue
Publication date: 10/11/2019
Field of study

This paper proposes a framework which is able to generate a sequence of three-dimensional human dance poses for a given music. The proposed framework consists of three components: a music feature encoder, a pose generator, and a music genre classifier. We focus on integrating these components for generating a realistic 3D human dancing move from music, which can be applied to artificial agents and humanoid robots. The trained dance pose generator, which is a generative autoregressive model, is able to synthesize a dance sequence longer than 5,000 pose frames. Experimental results of generated dance sequences from various songs show how the proposed method generates human-like dancing move to a given music. In addition, a generated 3D dance sequence is applied to a humanoid robot, showing that the proposed framework can make a robot to dance just by listening to music.Comment: 8 pages, 10 figure

arXiv.org e-Print Archive

ScholarWorks@UNIST

Self-Supervised Motion Retargeting with Safety Guarantee

Author: Ahn Hyemin
Choi Sungjoon
Kim Joohyung
Song Min Jae
Publication venue
Publication date: 10/03/2021
Field of study

In this paper, we present self-supervised shared latent embedding (S3LE), a data-driven motion retargeting method that enables the generation of natural motions in humanoid robots from motion capture data or RGB videos. While it requires paired data consisting of human poses and their corresponding robot configurations, it significantly alleviates the necessity of time-consuming data-collection via novel paired data generating processes. Our self-supervised learning procedure consists of two steps: automatically generating paired data to bootstrap the motion retargeting, and learning a projection-invariant mapping to handle the different expressivity of humans and humanoid robots. Furthermore, our method guarantees that the generated robot pose is collision-free and satisfies position limits by utilizing nonparametric regression in the shared latent space. We demonstrate that our method can generate expressive robotic motions from both the CMU motion capture database and YouTube videos

arXiv.org e-Print Archive

ScholarWorks@UNIST

Human-Object Interaction Prediction in Videos through Gaze Following

Author: Ahn Hyemin
Lee Dongheui
Mascaró Esteve Valls
Ni Zhifan
Publication venue: 'Elsevier BV'
Publication date: 29/05/2023
Field of study

Understanding the human-object interactions (HOIs) from a video is essential to fully comprehend a visual scene. This line of research has been addressed by detecting HOIs from images and lately from videos. However, the video-based HOI anticipation task in the third-person view remains understudied. In this paper, we design a framework to detect current HOIs and anticipate future HOIs in videos. We propose to leverage human gaze information since people often fixate on an object before interacting with it. These gaze features together with the scene contexts and the visual appearances of human-object pairs are fused through a spatio-temporal transformer. To evaluate the model in the HOI anticipation task in a multi-person scenario, we propose a set of person-wise multi-label metrics. Our model is trained and validated on the VidHOI dataset, which contains videos capturing daily life and is currently the largest video HOI dataset. Experimental results in the HOI detection task show that our approach improves the baseline by a great margin of 36.3% relatively. Moreover, we conduct an extensive ablation study to demonstrate the effectiveness of our modifications and extensions to the spatio-temporal transformer. Our code is publicly available on https://github.com/nizhf/hoi-prediction-gaze-transformer.Comment: Accepted by CVIU https://doi.org/10.1016/j.cviu.2023.10374

arXiv.org e-Print Archive

Institute of Transport Research:Publications

ScholarWorks@UNIST

Robust Human Motion Forecasting using Transformer-based Model

Author: Ahn Hyemin
Lee Dongheui
Ma Shuo
Mascaro Esteve Valls
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 23/10/2022
Field of study

Comprehending human motion is a fundamental challenge for developing Human-Robot Collaborative applications. Computer vision researchers have addressed this field by only focusing on reducing error in predictions, but not taking into account the requirements to facilitate its implementation in robots. In this paper, we propose a new model based on Transformer that simultaneously deals with the real time 3D human motion forecasting in the short and long term. Our 2-Channel Transformer (2CH-TR) is able to efficiently exploit the spatio-temporal information of a shortly observed sequence (400ms) and generates a competitive accuracy against the current state-of-the-art. 2CH-TR stands out for the efficient performance of the Transformer, being lighter and faster than its competitors. In addition, our model is tested in conditions where the human motion is severely occluded, demonstrating its robustness in reconstructing and predicting 3D human motion in a highly noisy environment. Our experiment results show that the proposed 2CH-TR outperforms the ST-Transformer, which is another state-of-the-art model based on the Transformer, in terms of reconstruction and prediction under the same conditions of input prefix. Our model reduces in 8.89% the mean squared error of ST-Transformer in short-term prediction, and 2.57% in long-term prediction in Human3.6M dataset with 400ms input prefix.Comment: This paper has been already accepted to the 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2022

arXiv.org e-Print Archive

Institute of Transport Research:Publications

ScholarWorks@UNIST

Visually Grounding Instruction for History-Dependent Manipulation

Author: Ahn Hyemin
Jeong Jaeyeon
Jun Howoong
Kim Kyungdo
Kwon Obin
Lee Dongheui
Lee Hongjung
Oh Songhwai
Publication venue
Publication date: 16/12/2020
Field of study

This paper emphasizes the importance of robot's ability to refer its task history, when it executes a series of pick-and-place manipulations by following text instructions given one by one. The advantage of referring the manipulation history can be categorized into two folds: (1) the instructions omitting details or using co-referential expressions can be interpreted, and (2) the visual information of objects occluded by previous manipulations can be inferred. For this challenge, we introduce the task of history-dependent manipulation which is to visually ground a series of text instructions for proper manipulations depending on the task history. We also suggest a relevant dataset and a methodology based on the deep neural network, and show that our network trained with a synthetic dataset can be applied to the real world based on images transferred into synthetic-style based on the CycleGAN.Comment: 8 pages, 6 figure

arXiv.org e-Print Archive

Institute of Transport Research:Publications

ScholarWorks@UNIST

Expression, Immobilization and Enzymatic Properties of Glutamate Decarboxylase Fused to a Cellulose-Binding Domain

Author: Ahn Jungoh
Jung Joon-Ki
Kim Chunsuk
Lee Eun Gyo
Lee Hongweon
Lee Hyeokwon
Lee Juwhan
Park Hyemin
Publication venue: Molecular Diversity Preservation International (MDPI)
Publication date: 01/01/2011
Field of study

Escherichia coli-derived glutamate decarboxylase (GAD), an enzyme that catalyzes the conversion of glutamic acid to gamma-aminobutyric acid (GABA), was fused to the cellulose-binding domain (CBD) and a linker of Trichoderma harzianum endoglucanase II. To prevent proteolysis of the fusion protein, the native linker was replaced with a S3N10 peptide known to be completely resistant to E. coli endopeptidase. The CBD-GAD expressed in E. coli was successfully immobilized on Avicel, a crystalline cellulose, with binding capacity of 33 ± 2 nmolCBD-GAD/gAvicel and the immobilized enzymes retained 60% of their initial activities after 10 uses. The results of this report provide a feasible alternative to produce GABA using immobilized GAD through fusion to CBD

Multidisciplinary Digital Publishing Institute

CiteSeerX

Directory of Open Access Journals

PubMed Central

Visually Grounding Language Instruction for History-Dependent Manipulation

Author: Ahn Hyemin
Jeong Jaeyeon
Jun Howoong
Kim Kyungdo
Kwon Obin
Lee Dongheui
Lee Hongjung
Oh Songhwai
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/05/2022
Field of study

This paper emphasizes the importance of a robot's ability to refer to its task history, especially when it executes a series of pick-and-place manipulations by following language instructions given one by one. The advantage of referring to the manipulation history can be categorized into two folds: (1) the language instructions omitting details but using expressions referring to the past can be interpreted, and (2) the visual information of objects occluded by previous manipulations can be inferred. For this, we introduce a history-dependent manipulation task which objective is to visually ground a series of language instructions for proper pick-and-place manipulations by referring to the past. We also suggest a relevant dataset and model which can be a baseline, and show that our model trained with the proposed dataset can also be applied to the real world based on the CycleGAN. Our dataset and code are publicly available on the project website: https://sites.google.com/view/history-dependent-manipulation

Institute of Transport Research:Publications